The dominant multi-camera 3D detection paradigm is based on explicit 3D feature construction, which requires complicated indexing of local image-view features via 3D-to-2D projection. Other methods implicitly introduce geometric positional encoding and perform global attention (e.g., PETR) to build the relationship between image tokens and 3D objects. The 3D-to-2D perspective inconsistency and global attention lead to a weak correlation between foreground tokens and queries, resulting in slow convergence. We propose Focal-PETR with instance-guided supervision and spatial alignment module to adaptively focus object queries on discriminative foreground regions. Focal-PETR additionally introduces a down-sampling strategy to reduce the consumption of global attention. Due to the highly parallelized implementation and down-sampling strategy, our model, without depth supervision, achieves leading performance on the large-scale nuScenes benchmark and a superior speed of 30 FPS on a single RTX3090 GPU. Extensive experiments show that our method outperforms PETR while consuming 3x fewer training hours. The code will be made publicly available.
translated by 谷歌翻译
基于学习的视觉探针计(VO)算法在常见的静态场景上实现了显着的性能,受益于高容量模型和大量注释的数据,但在动态,填充的环境中往往会失败。语义细分在估计摄像机动作之前主要用于丢弃动态关联,但以丢弃静态功能为代价,并且很难扩展到看不见的类别。在本文中,我们利用相机自我运动和运动分割之间的相互依赖性,并表明两者都可以在单个基于学习的框架中共同完善。特别是,我们提出了Dytanvo,这是第一个涉及动态环境的基于学习的VO方法。它需要实时两个连续的单眼帧,并以迭代方式预测相机的自我运动。我们的方法在现实世界动态环境中的最先进的VOUTESS的平均提高27.7%,甚至在动态视觉SLAM系统中进行竞争性,从而优化了后端的轨迹。在很多看不见的环境上进行的实验也证明了我们的方法的普遍性。
translated by 谷歌翻译
将回归系数融合到均匀组中可以揭示在每个组内共享共同值的系数。这种扩展均匀性降低了参数空间的内在尺寸,并释放统计学精度。我们提出并调查了一个名为$ l_0 $ -fusion的新的组合分组方法,这些方法可用于混合整数优化(MIO)。在统计方面,我们识别称为分组灵敏度的基本量,该基本量为恢复真实组的难度。我们展示$ l_0 $ -fusion在分组灵敏度的最弱需求下实现了分组一致性:如果违反了这一要求,则小组拼写的最低风险将无法收敛到零。此外,我们展示了在高维制度中,可以使用无需任何必要的统计效率损失的确保筛选特征,同时降低计算成本的校正特征耦合耦合的$ L_0 $ -Fusion。在算法方面,我们为$ l_0 $ -fusion提供了一个mio配方,以及温暖的开始策略。仿真和实际数据分析表明,在分组准确性方面,$ L_0 $ -FUSUS展示其竞争对手的优势。
translated by 谷歌翻译
我们的目标是从规定的行动类别中解决从规定的行动类别创造多元化和自然人动作视频的有趣但具有挑战性的问题。关键问题在于能够在视觉外观中综合多种不同的运动序列。在本文中通过两步过程实现,该两步处理维持内部3D姿势和形状表示,Action2Motion和Motion2Video。 Action2Motion随机生成规定的动作类别的合理的3D姿势序列,该类别由Motion2Video进行处理和呈现,以形成2D视频。具体而言,Lie代数理论从事人类运动学的物理法之后代表自然人动作;开发了一种促进输出运动的分集的时间变化自动编码器(VAE)。此外,给定衣服人物的额外输入图像,提出了整个管道以提取他/她的3D详细形状,并在视频中呈现来自不同视图的合理运动。这是通过改进从单个2D图像中提取3D人类形状和纹理,索引,动画和渲染的现有方法来实现这一点,以形成人类运动的2D视频。它还需要3D人类运动数据集的策策和成果进行培训目的。彻底的经验实验,包括消融研究,定性和定量评估表现出我们的方法的适用性,并展示了解决相关任务的竞争力,其中我们的方法的组成部分与最先进的方式比较。
translated by 谷歌翻译
人体步态是指不仅代表活动能力的每日运动,而且还可以用人类观察者或计算机来识别步行者。最近的研究表明,步态甚至传达了有关沃克情绪的信息。不同情绪状态中的个体可能显示出不同的步态模式。各种情绪和步态模式之间的映射为自动情绪识别提供了新的来源。与传统的情绪检测生物识别技术(例如面部表达,言语和生理参数)相比,步态是可以观察到的,更难以模仿,并且需要从该主题中进行较少的合作。这些优势使步态成为情感检测的有前途的来源。本文回顾了有关基于步态的情绪检测的当前研究,尤其是关于步态参数如何受到不同情绪状态的影响以及如何通过不同的步态模式识别情绪状态的研究。我们专注于情感识别过程中应用的详细方法和技术:数据收集,预处理和分类。最后,我们讨论了使用智能计算和大数据的最先进技术的状态来讨论高效有效的基于步态的情感识别的可能发展。
translated by 谷歌翻译
Partial differential equations (PDEs) are widely used for description of physical and engineering phenomena. Some key parameters involved in PDEs, which represents certain physical properties with important scientific interpretations, are difficult or even impossible to be measured directly. Estimation of these parameters from noisy and sparse experimental data of related physical quantities is an important task. Many methods for PDE parameter inference involve a large number of evaluations of numerical solution of PDE through algorithms such as finite element method, which can be time-consuming especially for nonlinear PDEs. In this paper, we propose a novel method for estimating unknown parameters in PDEs, called PDE-Informed Gaussian Process Inference (PIGPI). Through modeling the PDE solution as a Gaussian process (GP), we derive the manifold constraints induced by the (linear) PDE structure such that under the constraints, the GP satisfies the PDE. For nonlinear PDEs, we propose an augmentation method that transfers the nonlinear PDE into an equivalent PDE system linear in all derivatives that our PIGPI can handle. PIGPI can be applied to multi-dimensional PDE systems and PDE systems with unobserved components. The method completely bypasses the numerical solver for PDE, thus achieving drastic savings in computation time, especially for nonlinear PDEs. Moreover, the PIGPI method can give the uncertainty quantification for both the unknown parameters and the PDE solution. The proposed method is demonstrated by several application examples from different areas.
translated by 谷歌翻译
The proliferation of automatic faithfulness metrics for summarization has produced a need for benchmarks to evaluate them. While existing benchmarks measure the correlation with human judgements of faithfulness on model-generated summaries, they are insufficient for diagnosing whether metrics are: 1) consistent, i.e., decrease as errors are introduced into a summary, 2) effective on human-written texts, and 3) sensitive to different error types (as summaries can contain multiple errors). To address these needs, we present a benchmark of unfaithful minimal pairs (BUMP), a dataset of 889 human-written, minimally different summary pairs, where a single error (from an ontology of 7 types) is introduced to a summary from the CNN/DailyMail dataset to produce an unfaithful summary. We find BUMP complements existing benchmarks in a number of ways: 1) the summaries in BUMP are harder to discriminate and less probable under SOTA summarization models, 2) BUMP enables measuring the consistency of metrics, and reveals that the most discriminative metrics tend not to be the most consistent, 3) BUMP enables the measurement of metrics' performance on individual error types and highlights areas of weakness for future work.
translated by 谷歌翻译
Neuroimaging-based prediction methods for intelligence and cognitive abilities have seen a rapid development in literature. Among different neuroimaging modalities, prediction based on functional connectivity (FC) has shown great promise. Most literature has focused on prediction using static FC, but there are limited investigations on the merits of such analysis compared to prediction based on dynamic FC or region level functional magnetic resonance imaging (fMRI) times series that encode temporal variability. To account for the temporal dynamics in fMRI data, we propose a deep neural network involving bi-directional long short-term memory (bi-LSTM) approach that also incorporates feature selection mechanism. The proposed pipeline is implemented via an efficient GPU computation framework and applied to predict intelligence scores based on region level fMRI time series as well as dynamic FC. We compare the prediction performance for different intelligence measures based on static FC, dynamic FC, and region level time series acquired from the Adolescent Brain Cognitive Development (ABCD) study involving close to 7000 individuals. Our detailed analysis illustrates that static FC consistently has inferior prediction performance compared to region level time series or dynamic FC for unimodal rest and task fMRI experiments, and in almost all cases using a combination of task and rest features. In addition, the proposed bi-LSTM pipeline based on region level time series identifies several shared and differential important brain regions across task and rest fMRI experiments that drive intelligence prediction. A test-retest analysis of the selected features shows strong reliability across cross-validation folds. Given the large sample size from ABCD study, our results provide strong evidence that superior prediction of intelligence can be achieved by accounting for temporal variations in fMRI.
translated by 谷歌翻译
Underwater automatic target recognition (UATR) has been a challenging research topic in ocean engineering. Although deep learning brings opportunities for target recognition on land and in the air, underwater target recognition techniques based on deep learning have lagged due to sensor performance and the size of trainable data. This letter proposed a framework for learning the visual representation of underwater acoustic imageries, which takes a transformer-based style transfer model as the main body. It could replace the low-level texture features of optical images with the visual features of underwater acoustic imageries while preserving their raw high-level semantic content. The proposed framework could fully use the rich optical image dataset to generate a pseudo-acoustic image dataset and use it as the initial sample to train the underwater acoustic target recognition model. The experiments select the dual-frequency identification sonar (DIDSON) as the underwater acoustic data source and also take fish, the most common marine creature, as the research subject. Experimental results show that the proposed method could generate high-quality and high-fidelity pseudo-acoustic samples, achieve the purpose of acoustic data enhancement and provide support for the underwater acoustic-optical images domain transfer research.
translated by 谷歌翻译
在不同的运动模式之间切换(例如,楼梯上升/下降,坡道上升/下降)时,动力的假肢腿必须预见用户的意图。许多数据驱动的分类技术已经证明了预测用户意图的有希望的结果,但是这些意图预测模型对新主题的表现仍然不受欢迎。在其他域(例如,图像分类)中,通过从大型数据集(即预训练的模型)中使用先前学习的功能,然后将此学模型转移到可用的新任务中,可以提高转移学习的精度。在本文中,我们开发了一个基于人类运动数据集的内部受试者(受试者)和主体间(主体独立)验证的深卷卷神经网络。然后,我们使用剩下的主题中的一小部分(10%)将转移学习应用于主题独立的模型。我们比较了这三个模型的性能。我们的结果表明,转移学习(TL)模型的表现优于主题无关(IND)模型,并且与主题依赖性(DEP)模型(DEP错误:0.74 $ \ pm $ 0.002%,IND错误:11.59 $ \ \ PM $ 0.076%,TL错误:3.57 $ \ pm $ 0.02%,有10%的数据)。此外,正如预期的那样,随着剩余主题的更多数据的可用性,转移学习精度会提高。我们还通过各种传感器配置评估了意图预测系统的性能,这些传感器配置可能会在假肢应用程序中可用。我们的结果表明,假体的大腿IMU足以预测实践中的运动意图。
translated by 谷歌翻译